Replication Data for: A Reproduction of “Do Female Officers Police Differently? Evidence from Traffic Stops”

File Description: Reproduction ReadMe File
Author: Dianyi Yang
Last Edited: September 19, 2023


****************************************************************************************************************

SOFTWARE REQUIREMENTS: R (4.2.2) was used.

****************************************************************************************************************

COMPUTING ENVIRONMENT: Analyses were run on a Huawei Matebook 13 (2021 Intel) using Windows 11 with 16GB of memory. 

****************************************************************************************************************

RUNTIME: The three R scripts (Steps 1-3) provided by original authors should take less than 2 hours to run. The first and third take less than 30 minutes to run, while the second takes less than an hour to run. It was run (by the original authors) on a Mac laptop with 16GB of RAM. The fourth R script (Step 4) should, on it self, take about 1 hour and a half to run. Each of the four R scripts uses approximately 12 GB of RAM. At the end of running all three scripts, the code, data, models, and logs occupy just under 17GB of hard drive space. 

****************************************************************************************************************

A NOTE ON DATA SOURCES: The raw, original files for traffic stops from both the Charlotte Police Department and Florida Highway Patrol are included as they are public record.

****************************************************************************************************************

CODE REQUIREMENTS: 

The original code (Steps 1-3): relies on the following R packages: dplyr, ggplot2, texreg, readr, pscl, and arm. Within each of the R scripts, the user will need to update the working directory to your local working directory where the files and folders have been downloaded.

The reproduction improvement code (Step 4): relies on following additional R packages: rstudioapi, modelsummary, fixest, car, kableextra, alpaca, margins, stats, estimatr, plm, arsenal, lme4, modelr, foreign, complier and fwildclusterboot. The user does not need to update the working directory manually, as long as the code remains in the "Code" folder of the reproduction files where the files and folders have been downloaded.

****************************************************************************************************************
FOLDER STRUCTURE

In the main folder (<ROOT>), the user will find this readme file (readme.txt) and the codebook (Codebook.pdf).

The main reproduction folder contains four subfolders:

1. Code
(Original Paper)
- Step1_MainAnalysisAndData.R: This R script cleans the raw data sets to produce three cleaned data sets used for the analysis and a summary dataset used to create the first figure, which are saved in the Data folder. It then runs the analysis from the body of the paper. Model output is saved and stored in the Data folder. 
- Step2_AppendixAnalysis.R: This R script runs the regressions which are shown in the appendix. Model output is saved and stored in the Data folder. 
- Step3_TablesAndFigures.R: This R script calls on the data sets and model output produced in the previous two R scripts to produce the tables and figures seen in the paper and in the appendix. Figures produced are saved to the Figures folder.
(Reproduction Paper)
- Step4_Replication R: This R script carries out the additional regressions and produces alternative tables and figures for the reproduction of results in the reproduction paper.

2. OutputLogs
- Step1_MainAnalysisAndData.html: This is the saved output and log of the first R script from the last execution of the this file. (by original authors)
- Step2_AppendixAnalysis.html: This is the saved output and log of the second R script from the last execution of the this file. (by original authors)
- Step3_TablesAndFigures.html: This is the saved output and log of the third R script. Figures produced are saved to the Figures folder and are not shown in these files from the last execution of the this file. (by original authors)
- Step4_Replication.html: This is the saved out put and log of the fourth R script. Figures produced are saved to the Figures folder and are also shown in the html. (by us)

3. Data
- Officer_Traffic_Stops_Original.csv: This data set was downloaded from Baumgartner et al’s replication files, which was originally accessed through the city of Charlotte’s open data portal. It is the raw data file.
- Officer_Traffic_Stop_Update.csv: This data set was downloaded directly from Charlotte’s open data portal. It is the raw data file. 
- fl_statewide_2019_08_13.csv: This data set was downloaded from the Stanford Open Policing Project. It is the raw data file. 
- NorthCarolina.RData: This is the cleaned data set produced in the Step 1 R Script for Charlotte, North Carolina. It is the data set directly used for all of the analysis. The codebook (section one) corresponds with this data set. 
- FloridaLarge.RData: This is the full, cleaned data set produced in the Step 1 R Script for Florida. 
- FloridaSmall.RData: This is the slimmed down and cleaned data set produced in the Step 1 R Script for Florida. It is the data set directly used for all of the analysis. The codebook (section two) corresponds with this data set. 
- FL_Aggregated.RData (and FL_Aggregated-2.RData): Aggregated Florida data. The codebook (section three) corresponded with this data set.
- Fig1_Data.RData (and Fig1_Data-2.tab): Data set used to create the first figure. 
- All other files in this folder correspond to the model output from the regressions fit in the Steps 1 and 2 R Scripts. 
- CMPD_Employee_Demographics-2.tab: Data set used to calculate the number of officers hired by gender by the Charlotte Police Department. 
- Others: Regression Output

4. Figures 
- Fig1_PredProp.png: Figure 1 from the original paper. 
- Fig2_PredProp.png: Figure 2 from the original paper.
- Fig1a_FrequencyByDivision.pdf: Figure 1a from the reproduction paper.
- Fig1b_DivisionEffects.pdf: Figure 1b from the reproduction paper.
- Fig2_PredProb_correction.pdf: Figure 2 from the reproduction paper.
- Fig3_PredHit_100stops.pdf: Figure 3 from the reproduction paper.